Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers

نویسندگان

Henrik Löf

Markus Nordén

Sverker Holmgren

چکیده

On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality, as the non-uniformity is a consequence of the physical distance between the cc-NUMA nodes. In this article, we compare the well established method of exploiting the rst-touch strategy using parallel initialization of data to an application-initiated page migration strategy as means of increasing the geographical locality for a set of important scienti c applications. Four PDE solvers parallelized using OpenMP are studied; two standard NAS NPB3.0-OMP benchmarks and two kernels from industrial applications. The solvers employ both structured and unstructured computational grids. The main conclusions of the study are: (1) that geographical locality is important for the performance of the applications, (2) that application-initiated migration outperforms the rsttouch scheme in almost all cases, and in some cases even results in performance which is close to what is obtained if all threads and data are allocated on a single node. We also suggest that such an application-initiated migration could be made fully transparent by letting the OpenMP compiler invoke it automatically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers

On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality. In this article, we study the performance of a parallel PDE solver with adaptive mesh refinement. The solver is parallelized using OpenMP and the adaptive mesh refinement makes dynamic load balancing n...

متن کامل

Simulation-Based Analysis of Parallel Runge-Kutta Solvers

We use simulation-based analysis to compare and investigate different shared-memory implementations of parallel and sequential embedded Runge-Kutta solvers for systems of ordinary differential equations. The results of the analysis help to provide a better understanding of the locality and scalability behavior of the implementations and can be used as a starting point for further optimizations.

متن کامل

Code Tiling for Improving the Cache Performance of PDE Solvers

For SOR-like PDE solvers, loop tiling either helps little in improving data locality or hurts their performance. This paper presents a novel compiler technique called code tiling for generating fast tiled codes for these solvers on uniprocessors with a memory hierarchy. Code tiling combines loop tiling with a new array layout transformation called data tiling in such a way that a significant am...

متن کامل

Performance Modelling for Parallel PDE Solvers on NUMA-Systems

A detailed model of the memory performance of a PDE solver running on a NUMA-system is set up. Due to the complexity of modern computers, such a detailed model inevitably is very complicated. Therefore, approximations are introduced that simplify the model and allows NUMA-systems and PDE solvers to be described conveniently. Using the simpli ed model, it is shown that PDE solvers using ordered ...

متن کامل

Martin Köhler Jens Saak Efficiency improving implementation techniques for large scale matrix equation solvers CSC / 09 - 10 Chemnitz Scientific Computing Preprints

We address the important field of large scale matrix based algorithms in control and model order reduction. Many important tools from theory and applications in systems theory have been widely ignored during the recent decades in the context of PDE constraint optimal control problems and simulation of electric circuits. Often this is due to the fact that large scale matrices are suspected to be...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers

نویسندگان

چکیده

منابع مشابه

Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers

Simulation-Based Analysis of Parallel Runge-Kutta Solvers

Code Tiling for Improving the Cache Performance of PDE Solvers

Performance Modelling for Parallel PDE Solvers on NUMA-Systems

Martin Köhler Jens Saak Efficiency improving implementation techniques for large scale matrix equation solvers CSC / 09 - 10 Chemnitz Scientific Computing Preprints

عنوان ژورنال:

اشتراک گذاری